Capturing Heterogeneity in Gene Expression Studies by “ Surrogate Variable Analysis ” Jeffrey
نویسندگان
چکیده
The false discovery rate (FDR) has been discussed extensively and it has been pointed out that the distribution of the null p-values must be “correct” or conservative for FDR estimation or any other standard statistical significance measure to behave properly. What is meant for distribution of the null p-values to be correct is that they are Uniformly distributed in the interval (0,1). The null p-values are have a conservative distribution or they are pushed towards 1 relative to the Uniform(0,1). P-values are constructed to have the Uniform distribution property under the null hypothesis, and if this cannot be done exactly the conservative version is calculated [1]. In a simulation study where the right answer is known, there is no off-the-shelf approach to test whether the null p-values have a proper distribution. In this study, we use a Kolmogorov-Smirnov (KS) test on the set of null p-values for deviation from the Uniform. However, we want to test whether this is true over many repeated simulations to avoid “getting lucky” on one particular simulated data set. If the set of null p-values are Uniform, then the p-value resulting from the KS test should also follow the Uniform distribution. Therefore, by examining the KS test p-values over all simulations, we can again apply a KS test to verify that these are Uniformly distributed. Here we have employed this nested KS test to compare the relative behavior of each multiple testing procedure discussed. If the quantiles of the KS test p-values follow the diagonal line in a quantile-quantile plot against the quantiles of the Uniform distribution, then this is very strong evidence that the p-values resulting from the procedure are “correct.”
منابع مشابه
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis
It has unambiguously been shown that genetic, environmental, demographic, and technical factors may have substantial effects on gene expression levels. In addition to the measured variable(s) of interest, there will tend to be sources of signal due to factors that are unknown, unmeasured, or too complicated to capture through simple models. We show that failing to incorporate these sources of h...
متن کاملSummary and discussion of: “Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis”
Gene expression study is well known to focus on finding association between expression levels of particular genes and some interesting variables, for example, a disease state. In such studies, besides the primary variable of interest, some other covariates are usually measured and included in the model of association tests. However, it is not possible to measure all the variables related to gen...
متن کاملSurrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies
MOTIVATION In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard analysis of variance (ANOVA)/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, ...
متن کاملGenetic polymorphism and expression analysis of cMBL gene in Iranian native and commercial chickens
The aims of this study were to compare the promoter sequence of the mannose-binding lectin (cMBL) gene in Iranian native and commercial chicken strains; as well as to compare the cMBL gene expression in crossbred and inbred chickens. In total 79 native (Western Azerbaijan native fowls, WANF) and 49 commercial (Arian Commercial Strain, ACS) birds were reared as parents under same management prac...
متن کاملPreserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction
MOTIVATION Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes ...
متن کاملI-43: Identification of SOX3 as an XX MaleSex Reversal Gene in Mice and Jumans
Background: Mammals utilise an XX/XY system of sex determination in which the Y-linked gene SRY (Sexdetermining region Y) exerts a dominant masculinising influence on sexual development. Sex chromosome homology and comparative sequence studies suggest that SRY evolved from the related SOX3 gene on the X chromosome, although there is no direct functional evidence to support this hypothesis. The ...
متن کامل